Dealing with Sensitive Collection Content
Preparing and selecting data for public access
An important approach within the CUE project was, to use ML-Technologies to analyse and improve metadata or to find ways to sensibly simplify or restructure work. The Creative User Empowerment project aimed to make a large amount of collection data available to the public. This, however, poses challenges for the internal collection documentation: The preparation and selection of data sets that can be made accessible to the public involves a lot of work.
In the collection of the Badisches Landesmuseum are the collections from the Staufen Picture Archive (Außenstelle Südbaden). In the Alwin Tölle collection, there are images depicting children that are curatorially and legally questionable with regard to publication under the CC-0 licence. Therefore, not all collection contents should be made accessible to the public without restriction.
In a cooperation project with Kiel University of Applied Sciences, the students Oliver Gorczyca and Anas Arodake took on the challenge, and solutions were sought as to how artificial intelligence methods can help to recognize sensitive content in the collection and exclude it from publication.
The development of an object recognition system tailored to museum collections
The project focused on the detection and localisation of minors in digital images provided by Badisches Museum. Methods were developed to identify images with children in order to prevent their publication. A pre-trained person detection model was used to perform a preliminary search and separate images with persons. These images were then pre-annotated using the same model with well-placed bounding boxes. These images served as the basis for further training of a custom object detection model that can recognise children or other labels, which are helpful for a curating or precurating process.
In developing the tool, CenterNet was chosen as a suitable model due to its ability to recognise human positions, as it offers an optimal approach to the automated identification of child pornographic material. Useful for museum curators is the Label Studio tool, which can be used to work with pre-trained models and label images.
In cooperation with:
Resources
Zhou, X., Wang, D. & Krähenbühl, P. (2019). Objects as Points. https://doi.org/10.48550/arXiv.1904.07850
Label Studio: https://labelstud.io/
Oliver Gorczyca and Anas Arodake: Customized object detection system: https://github.com/B1tstorm/customized_object_detection_system
Gorczyca, O., & Arodake, A. (2023). Customized Object Detection mit Transfer Learning am Beispiel von Bilddaten des Badischen Landesmuseums (PDF)